The goal of this file is to arrange gene-centric Caenorhbaditis RNA-seq data originally published as part of the modENCODE project or by the Waterston Lab at the University of Washington (Gerstein et al 2010, Gerstein et al 2014, and Warner et al 2019).
Each genome and gff file were downloaded from WormBase.org version WS290. Reads were aligned to each genome using STAR (v2.7.6a, –alignIntronMax 30000 –alignMatesGapMax 30000) and the species-specific WS290 GTF file for each genome. PCR duplicates were removed using “seldup” (Warner et al., 2019). Read counts were obtained for each gene (CDS region only which is labeled as “CDS” in C. briggsae and C. elegans and as “coding_exon” for C. remanei, C. japonica, and C. brenneri) using featureCounts (from the software package subread-2.0.6) using default settings. Only uniquely mapping reads were counted. Additionally read counts were obtained for the CDS regions and for the full transcripts using the featureCount options -M –fraction so that multimappers were counted by splitting them equally among all of the locations where they aligned. Alignment and counting was performed by LaDeana Hillier (Waterston Lab, UW).
Read data for each species was imported into R and annotated with information from WormBase ParaSite BiomaRT.
Raw reads were quantified as counts per million using the EdgeR package, then filtered to remove transcripts with low counts (less than 1 count-per-million). A list of discarded genes and their expression values across life stages was saved. Non-discarded gene values were normalized using the trimmed mean of M-values method (TMM, Robinson and Oshlack ) to permit between-samples comparisons. The mean-variance relationship was modeled using a precision weights approach Law et al 2014.
This document saves multiple files that are passed to a Shiny Web App for downstream browsing and on-demand analysis. Note that these files are saved in an Outputs folder; in order to make them accessible to a local version of the Shiny browser they need to be moved to appropriate subfolders within the App folder - the www sub folder (for .csv files) or the Data subfolder (for R objects). Stable copies are already located within those folders and do not need to be replaced unless the pre-processing steps change.
Note: Code chunks are collated and echoed at the end of the document in Appendix I.
Generate a digital gene expression list that could be easily shared/loaded for downstream filtering/normalization.
species_list <- tibble(species = c('elegans', 'briggsae', 'brenneri', 'remanei', 'japonica', 'cele_embryonic'))
species_list <- tibble(species = c('cele_embryonic'))
for (x in species_list$species) {
# read in the study design ----
targets <- read_tsv(paste0("./Data/", x, "/", x,"_study_design.txt"),
na = c("", "NA", "na"), show_col_types = F)
# load pre-generated annotation information
load(paste0("./Outputs/",x,"_geneAnnotations"))
# import featureCount output into R ----
if (x == 'elegans' | x == 'briggsae') {
path <- paste0("./Data/", x, "/featureCount.C_", x,".", targets$Biological_ID, ".CDS.unique_only.ws290.txt")
} else if (x == 'cele_embryonic') {
path <- paste0("./Data/", x, "/featureCount.C_elegans.", targets$Biological_ID, ".CDS.unique_only.ws290.txt")
} else {
path <- paste0("./Data/", x, "/featureCount.C_", x,".", targets$Biological_ID, ".coding_exon.unique_only.ws290.txt")
}
featureCountData<- rbindlist(lapply(path, fread), idcol="sample") %>%
mutate(sample = targets$sample[sample])
colnames(featureCountData)<-c('sample','geneID', 'geneName', 'stableID', "location", "length", "count")
featureCountData_wider <- featureCountData %>%
dplyr::select(!c(location, length)) %>%
pivot_wider(names_from = sample, values_from = count)
counts <- featureCountData_wider %>%
dplyr::select(-stableID)%>%
column_to_rownames(var = "geneID")
annotations_sub<-dplyr::select(featureCountData_wider, geneID) %>%
left_join(annotations, by = "geneID")
# generate a DGEList
myDGEList <- DGEList(counts,
samples = targets$sample,
group = targets$group,
genes = annotations_sub)
output.name <- paste0(x, '_DGEList')
save(myDGEList,
file = file.path(output.path,
output.name))
}
The goal of this chunk is to:
ggplot2 to visualize the impact of filtering and
normalization on the data._vDGEList). iv) a matrix of variance-stabilized gene
expression data, extracted from the vDGEList
(_log2cpm_filtered_norm_voom.csv) - this data is
downloadable from within the Browser App.species_list <- tibble(species = c('elegans', 'briggsae', 'brenneri', 'remanei', 'japonica', 'cele_embryonic'))
for (x in species_list$species) {
# load pre-generated DGEList information
load(paste0("./Outputs/",x,"_DGEList"))
# load pre-generated annotation information
load(paste0("./Outputs/",x,"_geneAnnotations"))
# read in the study design ----
targets <- read_tsv(paste0("./Data/", x, "/", x,"_study_design.txt"),
na = c("", "NA", "na"), show_col_types = F)
# Generate life stage IDs
ids <- rep(cbind(targets$group),
times = nrow(myDGEList$counts)) %>%
as_factor()
# calculate and plot log2 counts per million ----
# use the 'cpm' function from EdgeR to get log2 counts per million
# then coerce into a tibble
log2.cpm.df.pivot <-cpm(myDGEList, log=TRUE) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample)) %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids)
# plot the data
p1 <- ggplot(log2.cpm.df.pivot) +
aes(x=samples, y=expression, fill=life_stage) +
geom_violin(trim = FALSE, show.legend = T, alpha= 0.7) +
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="log2 expression", x = "sample",
title = paste0("C. ", x, ": Log2 Counts per Million (CPM)"),
subtitle="unfiltered, non-normalized",
fill= "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
coord_flip()
if (x == 'cele_embryonic') {
p1 <- p1 +
labs(fill = "Time point") +
theme(axis.text.y = element_text (size = 5))
}
# Filter the data ----
# filter genes/transcripts with low counts
# how many genes had more than 1 CPM (TRUE) in at least n samples
# Note: The cutoff "n" is adjusted for the number of
# samples in the smallest group of comparison.
keepers <- cpm(myDGEList) %>%
rowSums(.>1)>=1
myDGEList.filtered <- myDGEList[keepers,]
ids.filtered <- rep(cbind(targets$group),
times = nrow(myDGEList.filtered)) %>%
as_factor()
log2.cpm.filtered.df.pivot <- cpm(myDGEList.filtered, log=TRUE) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample)) %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids.filtered)
p2 <- ggplot(log2.cpm.filtered.df.pivot) +
aes(x=samples, y=expression, fill=life_stage) +
geom_violin(trim = FALSE, show.legend = T, alpha= 0.7) +
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="log2 expression", x = "sample",
title = paste0("C. ", x, ": Log2 Counts per Million (CPM)"),
subtitle="filtered, non-normalized",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
coord_flip()
if (x == 'cele_embryonic') {
p2 <- p2 +
labs(fill = "Time point") +
theme(axis.text.y = element_text (size = 5))
}
# Look at the genes excluded by the filtering step ----
# just to check that there aren't any with
# high expression that are in few samples
# Discarded genes
myDGEList.discarded <- myDGEList[!keepers,]
ids.discarded <- rep(cbind(targets$group),
times = nrow(myDGEList.discarded)) %>%
as_factor()
log2.cpm.discarded.df.pivot <- cpm(myDGEList.discarded, log=F) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample)) %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids.discarded)
# Genes that are above 1 cpm
log2.cpm.discarded.df.pivot %>%
dplyr::filter(expression > 1)
# Generate a matrix of discarded genes and their raw counts ----
discarded.gene.df <- log2.cpm.discarded.df.pivot %>%
pivot_wider(names_from = c(life_stage, samples),
names_sep = "-",
values_from = expression,
id_cols = geneID)%>%
left_join(annotations, by = "geneID")
# Save a matrix of discarded genes and their raw counts ----
output.name <- paste0(x, '_discardedGenes.csv')
discarded.gene.df %>%
write.csv(file = file.path(output.path,
output.name))
# Plot discarded genes
p.discarded <- ggplot(log2.cpm.discarded.df.pivot) +
aes(x=samples, y=expression, color=life_stage) +
geom_jitter(alpha = 0.3, show.legend = T)+
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="expression", x = "sample",
title = paste0("C. ", x, ": Counts per Million (CPM)"),
subtitle="genes excluded by low count filtering step, non-normalized",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10),
axis.text.y = element_text (size = 5)) +
coord_flip()
if (x == 'cele_embryonic') {
p.discarded <- p.discarded +
labs(fill = "Time point") +
theme(axis.text.y = element_text (size = 5))
}
# Normalize the data using a between samples normalization ----
# Source for TMM sample normalization here:
# https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25
myDGEList.filtered.norm <- calcNormFactors(myDGEList.filtered, method = "TMM")
log2.cpm.filtered.norm <- cpm(myDGEList.filtered.norm, log=TRUE)
log2.cpm.filtered.norm.df<- cpm(myDGEList.filtered.norm, log=TRUE) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample))
log2.cpm.filtered.norm.df.pivot<-log2.cpm.filtered.norm.df %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids.filtered)
p3 <- ggplot(log2.cpm.filtered.norm.df.pivot) +
aes(x=samples, y=expression, fill=life_stage) +
geom_violin(trim = FALSE, show.legend = T, alpha = 0.7) +
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="log2 expression", x = "sample",
title = paste0("C. ", x, ": Log2 Counts per Million (CPM)"),
subtitle="filtered, TMM normalized",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10),
axis.text.y = element_text (size = 5)) +
coord_flip()
if (x == 'cele_embryonic') {
p3 <- p3 +
labs(fill = "Time point") +
theme(axis.text.y = element_text (size = 5))
}
# Compute Variance-Stabilized DGEList Object ----
# Set up the design matrix ----
# for C. elegans embryonic timeline, use a blocking design to account for batch effects
if (x == 'cele_embryonic') {
group <- factor(targets$group)
block <- factor(targets$block) #do blocking by polya+ and wholeRNA (by sample date)
design <- model.matrix(~block + group)
} else{
# no intercept/blocking for matrix, comparisons across group
group <- factor(targets$group)
design <- model.matrix(~0 + group)
colnames(design) <- levels(group)}
# Model mean-variance trend and fit linear model to data ----
# Use VOOM function from Limma package to model the mean-variance relationship
# produces a variance-stabilized DGEList, that include precision
# weights for each gene to try and control for heteroscedasity.
# transforms count data to log2-counts per million
# Outputs: E = normalized expression values on the log2 scale
v.DGEList.filtered.norm <- voom(counts = myDGEList.filtered.norm,
design = design, plot = T)
colnames(v.DGEList.filtered.norm)<-targets$sample
if (x == 'cele_embryonic') {
colnames(v.DGEList.filtered.norm$E) <- paste(targets$block,targets$group,
targets$sample,sep = '-')
}else {
colnames(v.DGEList.filtered.norm$E) <- paste(targets$group,
targets$sample,sep = '-')}
# Perform multivariate analysis on the data
# Identify variables of interest in study design file ----
group <- factor(targets$group)
# Hierarchical clustering ---------------
# Remember: hierarchical clustering can only work on a data matrix, not a data frame
# Calculate distance matrix
# dist calculates distance between rows, so transpose data so that we get distance between samples.
# how similar are samples from each other
colnames(log2.cpm.filtered.norm)<- targets$group
distance <- dist(t(log2.cpm.filtered.norm), method = "maximum") #other distance methods are "euclidean", maximum", "manhattan", "canberra", "binary" or "minkowski"
# Calculate clusters to visualize differences. This is the hierarchical clustering.
# The methods here include: single (i.e. "friends-of-friends"), complete (i.e. complete linkage), and average (i.e. UPGMA). Here's a comparison of different types: https://en.wikipedia.org/wiki/UPGMA#Comparison_with_other_linkages
clusters <- hclust(distance, method = "complete") #other agglomeration methods are "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", or "centroid"
dend <- as.dendrogram(clusters)
p4<-dend %>%
#dendextend::set("branches_k_color", k = 5) %>%
dendextend::set("hang_leaves", c(0.1)) %>%
dendextend::set("labels_cex", c(0.5)) %>%
#dendextend::set("labels_colors", k = 5) %>%
dendextend::set("branches_lwd", c(0.7)) %>%
as.ggdend %>%
ggplot (offset_labels = -0.5) +
theme_dendro() +
ylim(-8, max(get_branches_heights(dend))) +
labs(title = "Hierarchical Cluster Dendrogram ",
subtitle = "filtered, TMM normalized",
y = "Distance",
x = "Life stage") +
coord_fixed(1/2) +
theme(axis.title.x = element_text(color = "black"),
axis.title.y = element_text(angle = 90),
axis.text.y = element_text(angle = 0),
axis.line.y = element_line(color = "black"),
axis.ticks.y = element_line(color = "black"),
axis.ticks.length.y = unit(2, "mm"))
# Principal component analysis (PCA) -------------
# this also works on a data matrix, not a data frame
pca.res <- prcomp(t(log2.cpm.filtered.norm), scale.=F, retx=T)
summary(pca.res) # Prints variance summary for all principal components.
#pca.res$rotation #$rotation shows you how much each gene influenced each PC (called 'scores')
#pca.res$x # 'x' shows you how much each sample influenced each PC (called 'loadings')
#note that these have a magnitude and a direction (this is the basis for making a PCA plot)
## This generates a screeplot: a standard way to view eigenvalues for each PCA. Shows the proportion of variance accounted for by each PC. Plotting only the first 10 dimensions.
p5<-fviz_eig(pca.res,
#barcolor = brewer.pal(8,"Pastel2")[8],
#barfill = brewer.pal(8,"Pastel2")[8],
linecolor = "black",
main = "Scree plot: proportion of variance accounted for by each principal component", ggtheme = theme_bw())
pc.var<-pca.res$sdev^2 # sdev^2 captures these eigenvalues from the PCA result
pc.per<-round(pc.var/sum(pc.var)*100, 1) # we can then use these eigenvalues to calculate the percentage variance explained by each PC
# Visualize the PCA result ------------------
#lets first plot any two PCs against each other
#We know how much each sample contributes to each PC (loadings), so let's plot
pca.res.df <- as_tibble(pca.res$x[,1:2]) %>%
add_column(group = targets$group)
if (x == 'cele_embryonic') {
pca.res.df <- add_column(pca.res.df, block = targets$block)}
# Plotting PC1 and PC2
if (x == 'cele_embryonic') {
p6<-ggplot(pca.res.df) +
aes(x=PC1, y=PC2,
shape = block,
label = group,
fill = group,
color = group
) +
geom_point(size=4) +
xlab(paste0("PC1 (",pc.per[1],"%",")")) +
ylab(paste0("PC2 (",pc.per[2],"%",")")) +
labs(title="Principal Components Analysis",
sub = "Note: analysis is blind to sample identity.",
shape = "Experiment",
fill = "Time point",
color = "Time point") +
scale_x_continuous(expand = c(.3, .3)) +
scale_y_continuous(expand = c(.3, .3)) +
coord_fixed() +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10))
} else {
p6<-ggplot(pca.res.df) +
aes(x=PC1, y=PC2,
label = group,
fill = group,
color = group
) +
geom_point(size=4) +
xlab(paste0("PC1 (",pc.per[1],"%",")")) +
ylab(paste0("PC2 (",pc.per[2],"%",")")) +
labs(title="Principal Components Analysis",
sub = "Note: analysis is blind to sample identity.",
fill = "Life stage",
color = "Life stage") +
scale_x_continuous(expand = c(.3, .3)) +
scale_y_continuous(expand = c(.3, .3)) +
coord_fixed() +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10))
}
# Create a PCA 'small multiples' chart ----
if (ncol(pca.res$x)>5){
pca.res.df <- pca.res$x[,1:6]}
else {
pca.res.df <- pca.res$x[,1:ncol(pca.res$x)]
}
pca.res.df <- pca.res.df %>%
as_tibble() %>%
add_column(sample = targets$sample,
group = targets$group)
pca.pivot <- pivot_longer(pca.res.df, # dataframe to be pivoted
cols = PC1:PC3, # column names to be stored as a SINGLE variable
names_to = "PC", # name of that new variable (column)
values_to = "loadings") # name of new variable (column) storing all the values (data)
PC1<-subset(pca.pivot, PC == "PC1")
PC2 <-subset(pca.pivot, PC == "PC2")
PC3 <- subset(pca.pivot, PC == "PC3")
#PC4 <- subset(pca.pivot, PC == "PC4")
# New facet label names for PCs
PC.labs <- c(paste0("PC1 (",pc.per[1],"%",")"),
paste0("PC2 (",pc.per[2],"%",")"),
paste0("PC3 (",pc.per[3],"%",")")
)
names(PC.labs) <- c("PC1", "PC2", "PC3")
p7<-ggplot(pca.pivot) +
aes(x=sample, y=loadings) + # you could iteratively 'paint' different co-variates onto this plot using the 'fill' aes
geom_bar(stat="identity") +
facet_wrap(~PC, labeller = labeller(PC = PC.labs)) +
geom_bar(data = PC1, stat = "identity", aes(fill = group)) +
geom_bar(data = PC2, stat = "identity", aes(fill = group)) +
geom_bar(data = PC3, stat = "identity", aes(fill = group)) +
labs(title="PCA 'small multiples' plot",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
scale_x_discrete(limits = targets$sample, labels = targets$group) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10))+
coord_flip()
if (x == 'cele_embryonic') {
p7 <- p7 +
labs(fill = "Time point") +
theme(axis.text.y = element_text (size = 5))
}
# Save outputs
output.name <- paste0(x, '_FilteringNormalizationGraphs')
save(p1, p2, p3, p4, p5, p6, p7, p.discarded, discarded.gene.df,
file = file.path(output.path,
output.name))
output.name <- paste0(x, '_DGEList_filtered_normalized')
save(myDGEList.filtered.norm,
file = file.path(output.path,
output.name))
# Save matrix of genes and their filtered, normalized, voom-transformed counts ----
# This is the count data that underlies the differential expression analyses in the Shiny app.
# Saving it here so that users of the app can access the input information.
output.name <- paste0(x, '_log2cpm_filtered_norm_voom.csv')
# Save in Shiny app download (www) folder
write.csv(v.DGEList.filtered.norm$E,
file = file.path(www.path,
output.name))
# Save v.DGEList ----
# Save in Shiny app Data folder
output.name <- paste0(x, '_vDGEList')
save(v.DGEList.filtered.norm,
file = file.path(app.path,
output.name))
}
Clustering performed on filtered and normalized abundance data using
the “complete” method.
A scree plot is a standard way to view eigenvalues for each PCA. The
plot shows the proportion of variance accounted for by each PC.
Plot of the samples in PCA space. Fill color indicates life
stage.
Clustering performed on filtered and normalized abundance data using
the “complete” method.
A scree plot is a standard way to view eigenvalues for each PCA. The
plot shows the proportion of variance accounted for by each PC.
Plot of the samples in PCA space. Fill color indicates life
stage.
Clustering performed on filtered and normalized abundance data using
the “complete” method.
A scree plot is a standard way to view eigenvalues for each PCA. The
plot shows the proportion of variance accounted for by each PC.
Plot of the samples in PCA space. Fill color indicates life
stage.
Clustering performed on filtered and normalized abundance data using
the “complete” method.
A scree plot is a standard way to view eigenvalues for each PCA. The
plot shows the proportion of variance accounted for by each PC.
Plot of the samples in PCA space. Fill color indicates life
stage.
Clustering performed on filtered and normalized abundance data using
the “complete” method.
A scree plot is a standard way to view eigenvalues for each PCA. The
plot shows the proportion of variance accounted for by each PC.
Plot of the samples in PCA space. Fill color indicates life
stage.
Clustering performed on filtered and normalized abundance data using
the “complete” method.
A scree plot is a standard way to view eigenvalues for each PCA. The
plot shows the proportion of variance accounted for by each PC.
Plot of the samples in PCA space. Fill color indicates life
stage.
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE,
collapse = TRUE
)
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")
if (!require("tximport", quietly = TRUE)) BiocManager::install("tximport", ask = FALSE)
if (!require("pacman", quietly = TRUE)) install.packages("pacman")
pacman::p_load("tidyverse","data.table", "magrittr","edgeR","matrixStats","cowplot","ggthemes","gprofiler2","limma","tximport", "knitr", "ggforce", "ggdendro", "factoextra", "gridExtra", "dendextend")
# Check for presence of output folder, generate if it doesn't exist
output.path <- "./Outputs"
if (!dir.exists(output.path)){
dir.create(output.path)
}
app.path <-"../Data"
www.path <-"../www"
species_list <- tibble(species = c('elegans', 'briggsae', 'brenneri', 'remanei', 'japonica', 'cele_embryonic'))
species_list <- tibble(species = c('cele_embryonic'))
for (x in species_list$species) {
# read in the study design ----
targets <- read_tsv(paste0("./Data/", x, "/", x,"_study_design.txt"),
na = c("", "NA", "na"), show_col_types = F)
# load pre-generated annotation information
load(paste0("./Outputs/",x,"_geneAnnotations"))
# import featureCount output into R ----
if (x == 'elegans' | x == 'briggsae') {
path <- paste0("./Data/", x, "/featureCount.C_", x,".", targets$Biological_ID, ".CDS.unique_only.ws290.txt")
} else if (x == 'cele_embryonic') {
path <- paste0("./Data/", x, "/featureCount.C_elegans.", targets$Biological_ID, ".CDS.unique_only.ws290.txt")
} else {
path <- paste0("./Data/", x, "/featureCount.C_", x,".", targets$Biological_ID, ".coding_exon.unique_only.ws290.txt")
}
featureCountData<- rbindlist(lapply(path, fread), idcol="sample") %>%
mutate(sample = targets$sample[sample])
colnames(featureCountData)<-c('sample','geneID', 'geneName', 'stableID', "location", "length", "count")
featureCountData_wider <- featureCountData %>%
dplyr::select(!c(location, length)) %>%
pivot_wider(names_from = sample, values_from = count)
counts <- featureCountData_wider %>%
dplyr::select(-stableID)%>%
column_to_rownames(var = "geneID")
annotations_sub<-dplyr::select(featureCountData_wider, geneID) %>%
left_join(annotations, by = "geneID")
# generate a DGEList
myDGEList <- DGEList(counts,
samples = targets$sample,
group = targets$group,
genes = annotations_sub)
output.name <- paste0(x, '_DGEList')
save(myDGEList,
file = file.path(output.path,
output.name))
}
species_list <- tibble(species = c('elegans', 'briggsae', 'brenneri', 'remanei', 'japonica', 'cele_embryonic'))
for (x in species_list$species) {
# load pre-generated DGEList information
load(paste0("./Outputs/",x,"_DGEList"))
# load pre-generated annotation information
load(paste0("./Outputs/",x,"_geneAnnotations"))
# read in the study design ----
targets <- read_tsv(paste0("./Data/", x, "/", x,"_study_design.txt"),
na = c("", "NA", "na"), show_col_types = F)
# Generate life stage IDs
ids <- rep(cbind(targets$group),
times = nrow(myDGEList$counts)) %>%
as_factor()
# calculate and plot log2 counts per million ----
# use the 'cpm' function from EdgeR to get log2 counts per million
# then coerce into a tibble
log2.cpm.df.pivot <-cpm(myDGEList, log=TRUE) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample)) %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids)
# plot the data
p1 <- ggplot(log2.cpm.df.pivot) +
aes(x=samples, y=expression, fill=life_stage) +
geom_violin(trim = FALSE, show.legend = T, alpha= 0.7) +
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="log2 expression", x = "sample",
title = paste0("C. ", x, ": Log2 Counts per Million (CPM)"),
subtitle="unfiltered, non-normalized",
fill= "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
coord_flip()
if (x == 'cele_embryonic') {
p1 <- p1 +
labs(fill = "Time point") +
theme(axis.text.y = element_text (size = 5))
}
# Filter the data ----
# filter genes/transcripts with low counts
# how many genes had more than 1 CPM (TRUE) in at least n samples
# Note: The cutoff "n" is adjusted for the number of
# samples in the smallest group of comparison.
keepers <- cpm(myDGEList) %>%
rowSums(.>1)>=1
myDGEList.filtered <- myDGEList[keepers,]
ids.filtered <- rep(cbind(targets$group),
times = nrow(myDGEList.filtered)) %>%
as_factor()
log2.cpm.filtered.df.pivot <- cpm(myDGEList.filtered, log=TRUE) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample)) %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids.filtered)
p2 <- ggplot(log2.cpm.filtered.df.pivot) +
aes(x=samples, y=expression, fill=life_stage) +
geom_violin(trim = FALSE, show.legend = T, alpha= 0.7) +
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="log2 expression", x = "sample",
title = paste0("C. ", x, ": Log2 Counts per Million (CPM)"),
subtitle="filtered, non-normalized",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
coord_flip()
if (x == 'cele_embryonic') {
p2 <- p2 +
labs(fill = "Time point") +
theme(axis.text.y = element_text (size = 5))
}
# Look at the genes excluded by the filtering step ----
# just to check that there aren't any with
# high expression that are in few samples
# Discarded genes
myDGEList.discarded <- myDGEList[!keepers,]
ids.discarded <- rep(cbind(targets$group),
times = nrow(myDGEList.discarded)) %>%
as_factor()
log2.cpm.discarded.df.pivot <- cpm(myDGEList.discarded, log=F) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample)) %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids.discarded)
# Genes that are above 1 cpm
log2.cpm.discarded.df.pivot %>%
dplyr::filter(expression > 1)
# Generate a matrix of discarded genes and their raw counts ----
discarded.gene.df <- log2.cpm.discarded.df.pivot %>%
pivot_wider(names_from = c(life_stage, samples),
names_sep = "-",
values_from = expression,
id_cols = geneID)%>%
left_join(annotations, by = "geneID")
# Save a matrix of discarded genes and their raw counts ----
output.name <- paste0(x, '_discardedGenes.csv')
discarded.gene.df %>%
write.csv(file = file.path(output.path,
output.name))
# Plot discarded genes
p.discarded <- ggplot(log2.cpm.discarded.df.pivot) +
aes(x=samples, y=expression, color=life_stage) +
geom_jitter(alpha = 0.3, show.legend = T)+
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="expression", x = "sample",
title = paste0("C. ", x, ": Counts per Million (CPM)"),
subtitle="genes excluded by low count filtering step, non-normalized",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10),
axis.text.y = element_text (size = 5)) +
coord_flip()
if (x == 'cele_embryonic') {
p.discarded <- p.discarded +
labs(fill = "Time point") +
theme(axis.text.y = element_text (size = 5))
}
# Normalize the data using a between samples normalization ----
# Source for TMM sample normalization here:
# https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25
myDGEList.filtered.norm <- calcNormFactors(myDGEList.filtered, method = "TMM")
log2.cpm.filtered.norm <- cpm(myDGEList.filtered.norm, log=TRUE)
log2.cpm.filtered.norm.df<- cpm(myDGEList.filtered.norm, log=TRUE) %>%
as_tibble(rownames = "geneID") %>%
setNames(nm = c("geneID", targets$sample))
log2.cpm.filtered.norm.df.pivot<-log2.cpm.filtered.norm.df %>%
pivot_longer(cols = -geneID,
names_to = "samples",
values_to = "expression") %>%
add_column(life_stage = ids.filtered)
p3 <- ggplot(log2.cpm.filtered.norm.df.pivot) +
aes(x=samples, y=expression, fill=life_stage) +
geom_violin(trim = FALSE, show.legend = T, alpha = 0.7) +
stat_summary(fun = "median",
geom = "point",
shape = 20,
size = 2,
color = "black",
show.legend = FALSE) +
labs(y="log2 expression", x = "sample",
title = paste0("C. ", x, ": Log2 Counts per Million (CPM)"),
subtitle="filtered, TMM normalized",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10),
axis.text.y = element_text (size = 5)) +
coord_flip()
if (x == 'cele_embryonic') {
p3 <- p3 +
labs(fill = "Time point") +
theme(axis.text.y = element_text (size = 5))
}
# Compute Variance-Stabilized DGEList Object ----
# Set up the design matrix ----
# for C. elegans embryonic timeline, use a blocking design to account for batch effects
if (x == 'cele_embryonic') {
group <- factor(targets$group)
block <- factor(targets$block) #do blocking by polya+ and wholeRNA (by sample date)
design <- model.matrix(~block + group)
} else{
# no intercept/blocking for matrix, comparisons across group
group <- factor(targets$group)
design <- model.matrix(~0 + group)
colnames(design) <- levels(group)}
# Model mean-variance trend and fit linear model to data ----
# Use VOOM function from Limma package to model the mean-variance relationship
# produces a variance-stabilized DGEList, that include precision
# weights for each gene to try and control for heteroscedasity.
# transforms count data to log2-counts per million
# Outputs: E = normalized expression values on the log2 scale
v.DGEList.filtered.norm <- voom(counts = myDGEList.filtered.norm,
design = design, plot = T)
colnames(v.DGEList.filtered.norm)<-targets$sample
if (x == 'cele_embryonic') {
colnames(v.DGEList.filtered.norm$E) <- paste(targets$block,targets$group,
targets$sample,sep = '-')
}else {
colnames(v.DGEList.filtered.norm$E) <- paste(targets$group,
targets$sample,sep = '-')}
# Perform multivariate analysis on the data
# Identify variables of interest in study design file ----
group <- factor(targets$group)
# Hierarchical clustering ---------------
# Remember: hierarchical clustering can only work on a data matrix, not a data frame
# Calculate distance matrix
# dist calculates distance between rows, so transpose data so that we get distance between samples.
# how similar are samples from each other
colnames(log2.cpm.filtered.norm)<- targets$group
distance <- dist(t(log2.cpm.filtered.norm), method = "maximum") #other distance methods are "euclidean", maximum", "manhattan", "canberra", "binary" or "minkowski"
# Calculate clusters to visualize differences. This is the hierarchical clustering.
# The methods here include: single (i.e. "friends-of-friends"), complete (i.e. complete linkage), and average (i.e. UPGMA). Here's a comparison of different types: https://en.wikipedia.org/wiki/UPGMA#Comparison_with_other_linkages
clusters <- hclust(distance, method = "complete") #other agglomeration methods are "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", or "centroid"
dend <- as.dendrogram(clusters)
p4<-dend %>%
#dendextend::set("branches_k_color", k = 5) %>%
dendextend::set("hang_leaves", c(0.1)) %>%
dendextend::set("labels_cex", c(0.5)) %>%
#dendextend::set("labels_colors", k = 5) %>%
dendextend::set("branches_lwd", c(0.7)) %>%
as.ggdend %>%
ggplot (offset_labels = -0.5) +
theme_dendro() +
ylim(-8, max(get_branches_heights(dend))) +
labs(title = "Hierarchical Cluster Dendrogram ",
subtitle = "filtered, TMM normalized",
y = "Distance",
x = "Life stage") +
coord_fixed(1/2) +
theme(axis.title.x = element_text(color = "black"),
axis.title.y = element_text(angle = 90),
axis.text.y = element_text(angle = 0),
axis.line.y = element_line(color = "black"),
axis.ticks.y = element_line(color = "black"),
axis.ticks.length.y = unit(2, "mm"))
# Principal component analysis (PCA) -------------
# this also works on a data matrix, not a data frame
pca.res <- prcomp(t(log2.cpm.filtered.norm), scale.=F, retx=T)
summary(pca.res) # Prints variance summary for all principal components.
#pca.res$rotation #$rotation shows you how much each gene influenced each PC (called 'scores')
#pca.res$x # 'x' shows you how much each sample influenced each PC (called 'loadings')
#note that these have a magnitude and a direction (this is the basis for making a PCA plot)
## This generates a screeplot: a standard way to view eigenvalues for each PCA. Shows the proportion of variance accounted for by each PC. Plotting only the first 10 dimensions.
p5<-fviz_eig(pca.res,
#barcolor = brewer.pal(8,"Pastel2")[8],
#barfill = brewer.pal(8,"Pastel2")[8],
linecolor = "black",
main = "Scree plot: proportion of variance accounted for by each principal component", ggtheme = theme_bw())
pc.var<-pca.res$sdev^2 # sdev^2 captures these eigenvalues from the PCA result
pc.per<-round(pc.var/sum(pc.var)*100, 1) # we can then use these eigenvalues to calculate the percentage variance explained by each PC
# Visualize the PCA result ------------------
#lets first plot any two PCs against each other
#We know how much each sample contributes to each PC (loadings), so let's plot
pca.res.df <- as_tibble(pca.res$x[,1:2]) %>%
add_column(group = targets$group)
if (x == 'cele_embryonic') {
pca.res.df <- add_column(pca.res.df, block = targets$block)}
# Plotting PC1 and PC2
if (x == 'cele_embryonic') {
p6<-ggplot(pca.res.df) +
aes(x=PC1, y=PC2,
shape = block,
label = group,
fill = group,
color = group
) +
geom_point(size=4) +
xlab(paste0("PC1 (",pc.per[1],"%",")")) +
ylab(paste0("PC2 (",pc.per[2],"%",")")) +
labs(title="Principal Components Analysis",
sub = "Note: analysis is blind to sample identity.",
shape = "Experiment",
fill = "Time point",
color = "Time point") +
scale_x_continuous(expand = c(.3, .3)) +
scale_y_continuous(expand = c(.3, .3)) +
coord_fixed() +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10))
} else {
p6<-ggplot(pca.res.df) +
aes(x=PC1, y=PC2,
label = group,
fill = group,
color = group
) +
geom_point(size=4) +
xlab(paste0("PC1 (",pc.per[1],"%",")")) +
ylab(paste0("PC2 (",pc.per[2],"%",")")) +
labs(title="Principal Components Analysis",
sub = "Note: analysis is blind to sample identity.",
fill = "Life stage",
color = "Life stage") +
scale_x_continuous(expand = c(.3, .3)) +
scale_y_continuous(expand = c(.3, .3)) +
coord_fixed() +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10))
}
# Create a PCA 'small multiples' chart ----
if (ncol(pca.res$x)>5){
pca.res.df <- pca.res$x[,1:6]}
else {
pca.res.df <- pca.res$x[,1:ncol(pca.res$x)]
}
pca.res.df <- pca.res.df %>%
as_tibble() %>%
add_column(sample = targets$sample,
group = targets$group)
pca.pivot <- pivot_longer(pca.res.df, # dataframe to be pivoted
cols = PC1:PC3, # column names to be stored as a SINGLE variable
names_to = "PC", # name of that new variable (column)
values_to = "loadings") # name of new variable (column) storing all the values (data)
PC1<-subset(pca.pivot, PC == "PC1")
PC2 <-subset(pca.pivot, PC == "PC2")
PC3 <- subset(pca.pivot, PC == "PC3")
#PC4 <- subset(pca.pivot, PC == "PC4")
# New facet label names for PCs
PC.labs <- c(paste0("PC1 (",pc.per[1],"%",")"),
paste0("PC2 (",pc.per[2],"%",")"),
paste0("PC3 (",pc.per[3],"%",")")
)
names(PC.labs) <- c("PC1", "PC2", "PC3")
p7<-ggplot(pca.pivot) +
aes(x=sample, y=loadings) + # you could iteratively 'paint' different co-variates onto this plot using the 'fill' aes
geom_bar(stat="identity") +
facet_wrap(~PC, labeller = labeller(PC = PC.labs)) +
geom_bar(data = PC1, stat = "identity", aes(fill = group)) +
geom_bar(data = PC2, stat = "identity", aes(fill = group)) +
geom_bar(data = PC3, stat = "identity", aes(fill = group)) +
labs(title="PCA 'small multiples' plot",
fill = "Life stage",
caption=paste0("produced on ", Sys.time())) +
scale_x_discrete(limits = targets$sample, labels = targets$group) +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10))+
coord_flip()
if (x == 'cele_embryonic') {
p7 <- p7 +
labs(fill = "Time point") +
theme(axis.text.y = element_text (size = 5))
}
# Save outputs
output.name <- paste0(x, '_FilteringNormalizationGraphs')
save(p1, p2, p3, p4, p5, p6, p7, p.discarded, discarded.gene.df,
file = file.path(output.path,
output.name))
output.name <- paste0(x, '_DGEList_filtered_normalized')
save(myDGEList.filtered.norm,
file = file.path(output.path,
output.name))
# Save matrix of genes and their filtered, normalized, voom-transformed counts ----
# This is the count data that underlies the differential expression analyses in the Shiny app.
# Saving it here so that users of the app can access the input information.
output.name <- paste0(x, '_log2cpm_filtered_norm_voom.csv')
# Save in Shiny app download (www) folder
write.csv(v.DGEList.filtered.norm$E,
file = file.path(www.path,
output.name))
# Save v.DGEList ----
# Save in Shiny app Data folder
output.name <- paste0(x, '_vDGEList')
save(v.DGEList.filtered.norm,
file = file.path(app.path,
output.name))
}
load(paste0("./Outputs/elegans_FilteringNormalizationGraphs"))
p1
p2
p3
p.discarded
DT::datatable(discarded.gene.df,
rownames = FALSE,
escape = FALSE,
options = list(autoWidth = TRUE,
scrollX = TRUE,
scrollY = '300px',
scrollCollapse = TRUE,
searchHighlight = TRUE,
pageLength = 10,
lengthMenu = c("5",
"10",
"25",
"50",
"100"),
initComplete = htmlwidgets::JS(
"function(settings, json) {",
paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
"}")
)) %>%
DT::formatRound(columns=c(2:(ncol(discarded.gene.df)-10)),
digits=3)
rm("p1", "p2", "p3", "p4", "p5", "p6", "p7","p.discarded", "discarded.gene.df")
load(paste0("./Outputs/briggsae_FilteringNormalizationGraphs"))
p1
p2
p3
p.discarded
DT::datatable(discarded.gene.df,
rownames = FALSE,
escape = FALSE,
options = list(autoWidth = TRUE,
scrollX = TRUE,
scrollY = '300px',
scrollCollapse = TRUE,
searchHighlight = TRUE,
pageLength = 10,
lengthMenu = c("5",
"10",
"25",
"50",
"100"),
initComplete = htmlwidgets::JS(
"function(settings, json) {",
paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
"}")
)) %>%
DT::formatRound(columns=c(2:(ncol(discarded.gene.df)-10)),
digits=3)
rm("p1", "p2", "p3", "p4", "p5", "p6", "p7","p.discarded", "discarded.gene.df")
load(paste0("./Outputs/brenneri_FilteringNormalizationGraphs"))
p1
p2
p3
p.discarded
DT::datatable(discarded.gene.df,
rownames = FALSE,
escape = FALSE,
options = list(autoWidth = TRUE,
scrollX = TRUE,
scrollY = '300px',
scrollCollapse = TRUE,
searchHighlight = TRUE,
pageLength = 10,
lengthMenu = c("5",
"10",
"25",
"50",
"100"),
initComplete = htmlwidgets::JS(
"function(settings, json) {",
paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
"}")
)) %>%
DT::formatRound(columns=c(2:(ncol(discarded.gene.df)-10)),
digits=3)
rm("p1", "p2", "p3", "p4", "p5", "p6", "p7","p.discarded", "discarded.gene.df")
load(paste0("./Outputs/remanei_FilteringNormalizationGraphs"))
p1
p2
p3
p.discarded
DT::datatable(discarded.gene.df,
rownames = FALSE,
escape = FALSE,
options = list(autoWidth = TRUE,
scrollX = TRUE,
scrollY = '300px',
scrollCollapse = TRUE,
searchHighlight = TRUE,
pageLength = 10,
lengthMenu = c("5",
"10",
"25",
"50",
"100"),
initComplete = htmlwidgets::JS(
"function(settings, json) {",
paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
"}")
)) %>%
DT::formatRound(columns=c(2:(ncol(discarded.gene.df)-10)),
digits=3)
rm("p1", "p2", "p3", "p4", "p5", "p6", "p7","p.discarded", "discarded.gene.df")
load(paste0("./Outputs/japonica_FilteringNormalizationGraphs"))
p1
p2
p3
p.discarded
DT::datatable(discarded.gene.df,
rownames = FALSE,
escape = FALSE,
options = list(autoWidth = TRUE,
scrollX = TRUE,
scrollY = '300px',
scrollCollapse = TRUE,
searchHighlight = TRUE,
pageLength = 10,
lengthMenu = c("5",
"10",
"25",
"50",
"100"),
initComplete = htmlwidgets::JS(
"function(settings, json) {",
paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
"}")
)) %>%
DT::formatRound(columns=c(2:(ncol(discarded.gene.df)-10)),
digits=3)
rm("p1", "p2", "p3", "p4", "p5", "p6", "p7","p.discarded", "discarded.gene.df")
load(paste0("./Outputs/cele_embryonic_FilteringNormalizationGraphs"))
p1
p2
p3
p.discarded
DT::datatable(discarded.gene.df,
rownames = FALSE,
escape = FALSE,
options = list(autoWidth = TRUE,
scrollX = TRUE,
scrollY = '300px',
scrollCollapse = TRUE,
searchHighlight = TRUE,
pageLength = 10,
lengthMenu = c("5",
"10",
"25",
"50",
"100"),
initComplete = htmlwidgets::JS(
"function(settings, json) {",
paste0("$(this.api().table().container()).css({'font-size': '", "10pt", "'});"),
"}")
)) %>%
DT::formatRound(columns=c(2:(ncol(discarded.gene.df)-10)),
digits=3)
rm("p1", "p2", "p3", "p4", "p5", "p6", "p7","p.discarded", "discarded.gene.df")
load(paste0("./Outputs/elegans_FilteringNormalizationGraphs"))
p4
p5
p6
p7
rm("p1", "p2", "p3", "p4", "p5", "p6", "p7","p.discarded", "discarded.gene.df")
load(paste0("./Outputs/briggsae_FilteringNormalizationGraphs"))
p4
p5
p6
p7
rm("p1", "p2", "p3", "p4", "p5", "p6", "p7","p.discarded", "discarded.gene.df")
load(paste0("./Outputs/brenneri_FilteringNormalizationGraphs"))
p4
p5
p6
p7
rm("p1", "p2", "p3", "p4", "p5", "p6", "p7","p.discarded", "discarded.gene.df")
load(paste0("./Outputs/remanei_FilteringNormalizationGraphs"))
p4
p5
p6
p7
rm("p1", "p2", "p3", "p4", "p5", "p6", "p7","p.discarded", "discarded.gene.df")
load(paste0("./Outputs/japonica_FilteringNormalizationGraphs"))
p4
p5
p6
p7
rm("p1", "p2", "p3", "p4", "p5", "p6", "p7","p.discarded", "discarded.gene.df")
load(paste0("./Outputs/cele_embryonic_FilteringNormalizationGraphs"))
p4
p5
p6
p7
sessionInfo()
sessionInfo()
## R version 4.3.2 (2023-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Los_Angeles
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dendextend_1.17.1 gridExtra_2.3 factoextra_1.0.7
## [4] ggdendro_0.1.23 ggforce_0.4.1 knitr_1.45
## [7] gprofiler2_0.2.2 ggthemes_5.0.0 cowplot_1.1.1
## [10] matrixStats_1.1.0 edgeR_4.0.2 limma_3.58.1
## [13] magrittr_2.0.3 data.table_1.14.8 lubridate_1.9.3
## [16] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
## [19] purrr_1.0.2 readr_2.1.4 tidyr_1.3.0
## [22] tibble_3.2.1 ggplot2_3.4.4 tidyverse_2.0.0
## [25] pacman_0.5.1 tximport_1.30.0 BiocManager_1.30.22
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.0 viridisLite_0.4.2 farver_2.1.1 viridis_0.6.4
## [5] fastmap_1.1.1 lazyeval_0.2.2 tweenr_2.0.2 digest_0.6.33
## [9] timechange_0.2.0 lifecycle_1.0.4 ellipsis_0.3.2 statmod_1.5.0
## [13] compiler_4.3.2 rlang_1.1.2 sass_0.4.7 tools_4.3.2
## [17] utf8_1.2.4 yaml_2.3.7 ggsignif_0.6.4 labeling_0.4.3
## [21] htmlwidgets_1.6.3 bit_4.0.5 abind_1.4-7 withr_2.5.2
## [25] grid_4.3.2 polyclip_1.10-6 fansi_1.0.5 ggpubr_0.6.0
## [29] colorspace_2.1-1 scales_1.3.0 MASS_7.3-60 cli_3.6.1
## [33] rmarkdown_2.25 crayon_1.5.2 generics_0.1.3 rstudioapi_0.15.0
## [37] httr_1.4.7 tzdb_0.4.0 cachem_1.0.8 parallel_4.3.2
## [41] vctrs_0.6.4 carData_3.0-5 jsonlite_1.8.7 car_3.1-2
## [45] hms_1.1.3 bit64_4.0.5 ggrepel_0.9.4 rstatix_0.7.2
## [49] crosstalk_1.2.1 locfit_1.5-9.8 plotly_4.10.3 jquerylib_0.1.4
## [53] glue_1.6.2 DT_0.30 stringi_1.8.2 gtable_0.3.4
## [57] munsell_0.5.0 pillar_1.9.0 htmltools_0.5.7 R6_2.5.1
## [61] vroom_1.6.4 evaluate_0.23 lattice_0.22-5 highr_0.10
## [65] backports_1.4.1 broom_1.0.5 bslib_0.6.1 Rcpp_1.0.11
## [69] xfun_0.41 pkgconfig_2.0.3